Approximate Bayesian Computation via Regression 

Density Estimation 



Y. FanJ] D. J. Nonf] and S. A. SlSSON 



Abstract 



Approximate Bayesian computation (ABC) methods, which are applicable when the like- 
lihood is difficult or impossible to calculate, are an active topic of current research. Most 
current ABC algorithms directly approximate the posterior distribution, but an alterna- 
tive, less common strategy is to approximate the likelihood function. This has several 
advantages. First, in some problems, it is easier to approximate the likelihood than to 
approximate the posterior. Second, an approximation to the likelihood allows reference 
analyses to be constructed based solely on the likelihood. Third, it is straightforward to 
perform sensitivity analyses for several different choices of prior once an approximation 
to the likelihood is constructed, which needs to be done only once. The contribution of 
the present paper is to consider regression density estimation techniques to approximate 
the likelihood in the ABC setting. Our likelihood approximations build on recently devel- 
oped marginal adaptation density estimators by extending them for conditional density 
estimation. Our approach facilitates reference Bayesian inference, as well as frequentist 
inference. The method is demonstrated via a challenging problem of inference for stere- 
ological extremes, where we perform both frequentist and Bayesian inference. 

Keywords: Approximate Bayesian computation; Copulas; Likelihood-free inference; Mu- 
tivariate density estimation; Regression density estimation. 



1 Introduction 



Approximate Bayesian computation (ABC) methods (commonly described as "likelihood- 
free" methods) have been attracting increasing research interest as a viable procedure for 
performing Bayesian inference in the presence of computationally intractable likelihood 



functions. Initially popular in the biological sciences (Beaumont et al. 2002 JLuciani et aT 



2009 



Ratmann et al. 2007 Ratmann et al. 2009}, ABC ha s now found application in a 



wide range of areas - see the reviews by Beaumont (2010) Csillery et al. (20iO)| and |Sis^| 



son and Fan (2011) 



ABC methods require the ability to simulate data from the intractable likelihood, 
L{-\9), for a given parameter vector 9. An integral part of all existing algorithms in- 
volves sampling 9 from some distribution (typically the prior, tt{9)) and comparing sim- 
ulated data x ~ L(-\9) with the observed data xo. Posterior samples approximately from 
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7r(0|xo) are then obtained by weighting 9 according to some weighting function or ker- 
nel ifj([|x — xq||) with scale parameter 5 > 0. Accordingly, samples for which x«xo 
receive larger weights than those where x is very different to xo. 

More formally, ABC algorithms produce samples from an augmented posterior dis- 
tribution 

7r ABC (0,x|x o ) ociQ(||x-x o ||)L(x|0)7r(0). (1) 

It is easy to see that as 5 — > 0, only x rj xo is given non-trivial weight by K$, and the ABC 
posterior converges to the true posterior distribution, 7r(0|xo). However, 5 = practi- 
cally equates to repeated simulation until x = xo is matched exactly for each 6, which 
is computationally impractical in general. As such, for some 5 > 0, a necessarily ap- 
proximate posterior is obtained. Regression-adjustment strategies have been proposed to 
post-process the approximate posterior samples to correct the error in the ABC posterior 
(e.g. |Beaumont et al. 2002| |Blum and Francois 20iu) |Nott et al. 2012| . Dimension re- 



duction techniques are also typically used to reduce the overall computational overhead. 
Here, the vectors, x, are replaced by lower-dimensional sufficient or summary statistics, 



S = S(x). Readers should refer to Blum et al. (2012) Fearnhead and Prangle (2012) and 



Robert et al. (2011) for further discussion of the use and elicitation of summary statistics 



in ABC. 

In this paper, we present an alternative ABC approach based on constructing a di- 
rect approximation to the likelihood. The approximation is based on samples 0® (not 
necessarily from the prior), and the corresponding summaries S^, simulated from the 
intractable likelihood. Such an approach is attractive in several ways. Firstly, it is some- 
times easier to approximate the likelihood than to approximate the posterior directly. 
Secondly, separate estimation of the likelihood is useful for performing purely likelihood 
based analyses, such as maximum likelihood estimation. This may be more attractive to 
researchers more familiar with frequentist inference. Thirdly, direct approximation of the 
likelihood is useful for diagnosing prior-likelihood conflict, or for running many analyses 
with different priors in a sensitivity analysis, as the likelihood approximation only needs 
to be constructed once. A further example of a Bayesian use for a separate estimate of 
the likelihood is the construction of credible regions based on contours of the likelihood 
- these have a robust Bayes interpretation in terms of posterior probability content be- 
ing minimally sensitive to perturbations of the prior ( |Wasserman 1989| ). The method we 
propose here provides an explicit analytical expression for the likelihood function. It also 
does not require simulation from the prior, which is typically required by many ABC al- 



gorithms (see e.g. Pritchard et al. 1999 Tavare et al. 1997 Sisson et al. 2007 1. This is 



useful where there is interest in using reference priors. 

An early discussion of the concept of directly approximating an intractable likelihood 
function using simulated data sets was given by Diggle and Gratton (1984)} predating the 



current wave of activity in ABC research. More recently, several researchers have consid- 



ered direct likelihood approximations within ABC. Leuenberger and Wegmann (2010) 



developed an approach that involves embedding the regression adjustment of Beaumont 



et al. (2002) into a likelihood approximation based on a generalised linear model. Wood 
(2010)|considers a "synthetic likelihood" which involves estimating a normal distribution 



for the summary statistics, S, with mean and covariance depending on 9. The approx- 
imation of the mean and covariance is performed within a Markov chain Monte Carlo 
(MCMC) scheme, and through clever choice of summary statistics and quantile transfor- 
mations it may be possible to improve the approximation to normality. The approach we 
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propose is also related to that introduced by Bonassi et al. (2011) although this method 



does not directly approximate the likelihood. They propose to use mixtures of multi- 
variate normals to estimate a joint distribution for (0, S) where the summary statistics 
are simulated from an informative prior, and then condition on the observed summary 
statistic in the estimated mixture model. Appropriate localization and choice of summary 
statistics can help to improve the efficiency of the approach. 

We note that a direct approximation of the likelihood function is available by Monte 
Carlo integration (e.g. Sisson et al. 2008 Sisson and Fan 20 11 [ I. If we consider marginal- 



ising the distribution of the augmented posterior (|lj| with respect to the auxiliary data x, 
we then have the marginal posterior 

7T ABC (9\x ) oc n(0) [ ^(Hx-xolD^xl^x^^V^dlxW-xoll) (2) 

where x^ 1 ), . . . , x^, are samples from L(x|0). The summation term is a simple Monte 
Carlo estimate of the likelihood of xo at the point 0. Clearly this estimate is a function of 
5, and so posterior inferences will still require regression-adjustment post-processing. It 
is also a pointwise estimate, and so it must be effectively re-estimated for each likelihood 
evaluation in an analysis. Our approach results in a stand-alone functional expression 
for the likelihood function at 5 = 0. However, in the context of this paper, the estimator 
§2§ could usefully serve as a goodness-of-fit diagnostic. 

Our proposed approach to conditional density estimation for ABC (that is, estimat- 
ing the distribution of 0|S given the joint samples (0, S)) builds on recent work on flex- 
ible multivariate density estimation - in particular the marginal adaptation method of 



Giordani et al. (2012) The basis of their approach argues that estimation of univariate 
marginal distributions is easier than estimating a full multivariate density estimation di- 
rectly. Hence, a fruitful strategy is to adapt a multivariate density estimate to have given 
marginals, which are more precisely estimated individually. In this paper, we develop a 
strategy for implementing this approach for the related problem of conditional density 
estimation, in the context of likelihood estimation in ABC. 

In Section |2j we develop the estimator for the intractable likelihood function, and 
Section|3]outlines some connections with existing ABC approaches. Section [4] discusses a 
challenging example of inference for stereological extremes and Section [5] concludes. 



2 ABC via multivariate regression density estimation 

Our goal is to obtain an estimate of the likelihood L(S\0), denoted by L(S\0), and then 
approximate the intractable posterior distribution by 

tt(0|S o ) oc L(So|0)tt(0), 

where So = S(xo) denotes the summary statistics from the observed data. Once L(Sq\0) 
is obtained, drawing samples from 7r(0|So) can be achieved by any standard posterior 
simulation algorithm e.g. Markov chain Monte Carlo. We assume that the vector of sum- 
mary statistics, S = (S 1 , . . . , S k ) J , are given, with dim(S) = k and that dim(0) = d. 
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The likelihood estimate is constructed from samples (S^ , 0^) for i = 1, . . . , N, where 
(i) ~ h(0) and SW = S(xW) with x« ~ L(-|0 (i) ). The distribution function deter- 
mines the region of parameter space in which we require L(So\0) to be a good approx- 
imation of L(So|0). This must necessarily include the region of high posterior density. 
One option, where this is convenient, is to perform an initial pilot ABC analysis with a 
large value of 5 to broadly identify the high posterior density region, and then define h(0) 



to be proportional to the prior tt(0) in this region (Fearnhead and Prangle 2012 1. 



We first consider flexible estimation of the individual marginal distributions \ for 
each j = 1, . . . , k. Estimation of marginals is typically easier than estimation of the joint 
distribution due to the lower dimensionality, and there are many different methods avail- 



able for this purpose. Here we adopt a mixture of experts approach (Jacobs et al. 1991 



Jordan and Jacobs 1994[ >. Mixtures of experts models are mixtures of regression mod 



els where both parameters in the component response distributions and mixing weights 
are allowed to vary with covariates. In our case, for each , we express the marginal 
distribution of 5 J 1 as a mixture of J normal distributions 

J 

fj(S j \0) = ^ Wjl (0)N (^(0),4(0)) (3) 



£=1 



where 



w je (0) 



ex P (#+e* T g 



Here $ G R and ^ G R d , for j = 1, . . . k, I = 1, . . . J with $ = 0, & 1 = for identifia- 
bility, and 

Ht (0) = $ + 0^0, log (4(0)) = 7 ? + <y^0, 

with /3jj , G and 7^ G R d . For a sufficiently large sample size (N) and number 
of components (J), the model ^ can approximate the marginal distribution of S^\0 arbi- 
trarily well. Methods for fast and parsimonious fitting of ^ can be found in Nott et al. 
(2012)| and [Tranet al. (20127 



For the estimation of the joint dependencies, we propose a mixture of Gaussian cop- 



ulas approach similar to that proposed by Giordani et al. (2012) for density estimation. 
More precisely, for joint samples of (S^, W ) we firstly transform the fitted margins of 
5 J to be standard normal, so that 

W =<S>- 1 (F J (&\0)), (4) 

where U = (U 1 , . . . , U k ), Fj denotes the cumulative distribution function for the fitted 
density fj(-) in ^ and <3? denotes the standard normal cumulative distribution function. 
We then fit a mixture of multivariate normals to the joint samples (U, 0), and obtain the 



conditional distribution of U|0 in this mixture (see e.g. Norets and Pelenis 2012 . In 
more detail, the joint density of (U, 0) is modelled via a mixture of multivariate normal 
distributions 

L 

E< 

e=i 



g(U, 0) = V u e N k+d (v t , E<), (5) 
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where Yle=i = I and denote the mean and covariance matrices of the normal 

mixture components. The conditional density g(U|0) is then given by 

L 

g(\J\9) = ^2oj c e N k (u c e ,^), (6) 
1=1 

where v\ and Sl^ are the mean and covariance matrix of the conditional distribution given 
in the the ^-th mixture component, Nk + d{v£, 5^). The mixing weights for the condi- 
tional distribution, uj, are 



Zti^<i>(0,MO),^(o)) 



where (j) (0, vg(0), 5^(0)) denotes the marginal density for in the normal component 
Our density estimator for the likelihood is then given as 

where 4>(U^ , 0, 1) denotes a standard normal density evaluated at W. Note that L(S\9) 
does not have exactly fj{S^\9) as it's marginal distribution, since the marginal distribu- 
tion for W under g(XJ\9) is not exactly standard normal, unless L = 1. For this reason, 



Giordani et al. (2012) suggest recomputing the distribution of U 3 using another normal 
mixture, which would lead to replacing 4>{U^ , 0, 1) with this distribution in This will 
improve the quality of the likelihood approximation, albeit at additional computational 
cost. However in our experience we have found that an acceptable approximation is ob- 
tained even without enforcing the marginals fj(S^\9) exactly. 

A key component of the above approach is the transformation (S, 0) — >■ (U, 0). If the 
summary statistics are too highly correlated, or if the dependence between the statistics 
and the parameters is too complex, it will be difficult to capture the joint distribution of 
(S, 9) well in a simple mixture model. The primary role of the flexible mixture of experts 
approach ^ is to construct the transformation (S, 9) — > (U, 9) to simplify this complex- 
ity. However, such modelling can be greatly aided by careful initial selection of summary 



statistics to achieve near orthogonality and low dependence (e.g. Blum et al. 2012 Fearn- 
head and Prangle 2012} . 



A step by step description of how to of construct L(S\0) is: 
Step 1 Sample (S^,0 (i) ) ~ L(S\9)h(9) for i = 1, . . . , N. Define S« = . . . ,S^ k ). 

Step 2 For each S\ fit the mixture model ^ to the marginal samples (S^ , 0®). 

Step 3 Compute the transformations = (Fj(S^\9)), for j = l,...,k, i = 

1,...,N. Define U« = (U® 1 , . . . , U^ k ). 

Step 4 Fit the mixture model M to the samples (UW, (1) ), . . . , (U^, (Ar) ) and derive 
the model g(XJ\0) from $f 



5 



Step 5 Combine the results in Steps 2, 3 and 4 to obtain the final estimate for the likeli- 
hood L(S|0) via Q. 

The estimate, L(S\0), can then be used as part of any standard Bayesian or frequentist 
analysis. 

3 Connections with other methods 



Bonassi et al. (2011) provide a method to directly estimate the posterior density. A mix- 
ture of multivariate normals is used to approximate the joint distribution of (S, 0) based 
on samples (S^,0^) ~ L(S\0)it(0), where the prior must be informative. Condition- 
ing this mixture on S = So then provides an estimate of the posterior. To the best of 
our knowledge, this is the only approach that directly uses non-parametric conditional 
density estimation techniques in ABC. Note that alternatively conditioning the mixture 
approximation of (S, 0) on would result in a mixture of normals conditional density 
estimate for the likelihood. Similarly, we may derive a direct estimate of the posterior 
similar to (|7j| by conditioning on S rather than 0, although the sampling distribution h(0) 
must then equal the prior tt(0). Arguments for estimating likelihoods, rather than poste- 
rior distributions are provided in the Introduction. 



Giordani et al. (2012) discuss the performance of the mixture of multivariate normals 
density estimator, and observe that it can be difficult to estimate the marginals well using 
this approach. Our conditional estimator |7|) uses a mixture of normals to estimate the 
joint distribution of transformed data. The benefit of the transformation (S, 0) — >■ (U, 0) 
is that it is simpler to estimate the joint distribution of (U, 0) than that of (S,0). The 
mixture of normals model also provides greater flexibility over Gaussian copula models 
which would simply fit a normal distribution to the transformed data. 

There are clear benefits in using more flexible density estimation methods. For ex- 
ample, the flexibility in the mixture of experts model <j3j> to describe the relationship be- 
tween S 1 - 7 and 0, means that good transformations (S, 0) — > (U, 0) can be constructed for 
a wide range of vectors drawn from h{0). As such, we can obtain good approximations 
to L(S|0) for a wide range of as long as the regression models in (|3j are adequate and 
enough samples (SW, 0^') are available. Clearly, well-chosen summary statistics can also 
make the modelling easier, especially when they behave nearly linearly with the param- 



eters. See Blum et al. (2012) for a review of dimension reduction methods for ABC. 



Finally, we note that if the initial sample (S^, 0W) is obtained via a pilot ABC anal- 
ysis with a moderately large kernel scale parameter 5, then our approach can be viewed 
as another form of ABC posterior adjustment method. This is in the sense that the initial 
samples are used to estimate the likelihood at the point 5 = 0, from which (new) samples 
are subsequently obtained. See e.g. |Leuenberger and Wegmann (20 10)] and other regres- 
sion adjustment postprocessing methods (Beaumont et al. 2002: Blum and Francois 2010 
Nott et al. 2012[ ) for further information. 



4 Example: Inference for stereological extremes 



We now present a reanalysis of a stereological problem studied by jBortot et al. (2 007) 



focusing on the production of clean steels. Here inference is required on the size and 
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intensity of the largest microscopic particle inclusions in a 3-dimensional block of steel, 
based on inclusion cross sections observed in a 2-dimensional planar slice. The observed 
data consist of 112 inclusions intersecting the planar slice, with cross sectional diameters 



above a measurement threshold of uq = 5/im. We follow Bortot et al. (2007) and consider 
two models. The first assumes a Poisson process with rate A for inclusion centres, and 
a size distribution for the inclusions, which are assumed to be spherical, based on uni- 
variate extreme value theory. Here, conditional on exceeding the threshold, uq, inclusion 
sizes are assumed to follow a generalised Pareto distribution with scale and shape param- 
eters a > and £. In this setting it is possible to perform standard Bayesian inference, 
and so the correct posterior distribution is available. The second model is the same as the 
first, but assumes that the inclusions are ellipsoidal. For each model we have = (A, a, £). 



The analysis of Bortot et al. (2007) took the number of inclusions, and 112 equally 
spaced quantiles, q^, . . . , q^iu), as summary statistics. Here, as the quantiles will be 
highly correlated, we instead use Si = log(q(j +1 ) — qu\) for j = 1, . . . , 111, where S 112 
is the log number of inclusions. In order to determine the distribution h{9) in regions 
of non-negligible posterior density, we found it convenient to first draw samples from 
I(\\S — So|| < 5)L(S\9)U(0), where || • || denotes squared Euclidean distance, for a large 
value of 5 = 20, where U{9) is uniform over a moderately large region of parameter space 
expected to contain the posterior. We then defined 

h{9) = N(fi, t)I ((0 - A) T S _1 (« - A) < 3d) 

as a truncated normal distribution, where fi and S denote the mean and covariance of the 
^-component of these samples. In this manner, we obtained a final sample (S^, 0®) ~ 
L(S\0)h(0) of size 5,000 for each model. 

In the following, we demonstrate the performance of the regression density estima- 
tion methodology using the above summary statistics (corresponding to conditional den- 
sity estimation in 115 dimensions), as well as the "semi-automatic" summary statistics of 



Fearnhead and Prangle (2012) which provide one summary statistic per parameter (i.e. 
6-dimensional density estimation). In addition to standard ABC posterior inference we 
also perform a prior sensitivity analysis and a maximum-likelihood based inference. 

4.1 Results 

To fit the mixture of experts marginal distributions fj (p), we adopted the variational 



Bayes approach of Nott et al. (2012)| and used the variational lower bound to select the 
number of mixture components. For both spherical and ellipsoidal models we found 
J = 3 mixture components worked well for each of the 112 summaries, although J = 6 
components were required for the 3 semi-automatic statistics. Note that the fitting of 
the marginal models, fj, is highly parallelizable, and is accordingly suitable for appli- 
cations in high dimensions. Figure [l] shows the relationship between the parameters 
(A, a, £) and the three semi-automatic summary statistics, as observed in the sample 
(SW,0w) ^ L{S\0)h(0) for the ellipsoidal model. There are visibly strong relationships 
both within the summary statistic and parameter vectors, and between them. Figure [2] 
displays the same information following the transformation (S, 6) — > (U, 9). Clearly, the 
transformation has greatly simplified the underlying dependence structure. As a result, 
while the joint distribution of (S, 9) could be reasonably well modelled by a mixture of 



multivariate normal distributions (e.g. Bonassi et al. 2011 1, a better fit is likely to be ob 
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tained by alternatively fitting to (U, 0). 



Figure |3] shows the fitted marginal density estimate, fj(S^\0), for ,j = 5, 40, 80, 110 
under the spherical model, evaluated at 0q = (Ao,0"o>£o) = (30,1.5,-0.05). The his- 
togram corresponds to the true density, obtained from data generated under the model 
L(S\0q) at the point 0q, and the solid line denotes the fitted density. The plots indicate 
a very good fit, and this was typical of all other summary statistics, S^, and conditioned 
parameters, 0$, examined. These results indicate that marginal densities can be very well 
estimated. Indeed, marginal densities are much easier to estimate in general than joint 
densities. 

We used the R package Mclust to fit the joint distribution <?(U, 0) (|5j to the trans- 
formed samples, using L = 5 mixture components for both spherical and ellipsoidal 
models. Model selection for the number of mixture components is typically performed 
using the BIC. However care should be taken in high dimensions, as the dimension of 
(U, 0) can grow quickly, and the BIC tends to overpenalise the number of parameters. 



Giordani et al. (2012) use the variational Bayes approximation method to fit the mixture 



model, and advocate the use of the variational lower bound for model selection. 

The histograms in Figure [4] shows the regression density estimates of the marginal 
posteriors for the spherical inclusion model, based on 500,000 MCMC samples, obtained 
under the prior specification log A ~ N(0, WO 2 ), a ~ Ga(0.01, 0.0001) and £ ~ N(0, 100 2 ). 
The top panels are based on using 112 summary statistics, with the middle panels using 
the 3 semi-automatic summary statistics. The solid line indicates a density estimate of the 
true posterior marginal distribution derived from MCMC output using the known like- 
lihood for the spherical inclusions model. The quantile-quantile plots compare the pos- 
terior estimated with the semi-automatic statistics with the true marginals. Even when 
constructing regression density estimates of the likelihood in 115 dimensions, reasonably 
good marginal parameter estimates are obtained, although they are clearly not perfect. 
The estimates improve considerably when only using 3 summary statistics, as the regres- 
sion density estimation is then performed in only 6 dimensions. Clearly, the regression 
density approach has worked well, even in high dimensions. Recall that these approx- 
imations are obtained based on flexible modelling with only 5,000 well placed samples 
(S«,0 W ) (though there are some computational overheads to identify where to place 
these). This compares favourably with earlier analyses of these data | |Bortot et al. 2007) 
with the number of simulations in the tens of millions. 

The histograms in Figure [5] display the marginal posterior regression density esti- 
mates for the ellipsoidal model. Again, the top panels are based on using 112 sum- 
mary statistics, with the bottom panels using the 3 semi-automatic summary statistics. 
The solid lines represent the estimate of the posterior distribution obtained using the 
MCMC-based ABC method of |Bortot et al. (2007)| with kernel scale parameter 5 = 3.3. 



Under the spherical model, this method is able to correctly reproduce the true poste- 



rior for 5 = 3.3 (or lower). Bortot et al. (2007) then assumed that this approach could 



correctly reproduce the true posterior for the ellipsoidal model, although they were un- 
able to verify their results by reducing 5 further. From Figure |5j as with the spherical 
inclusions model, while the marginal posterior distributions under the regression den- 
sity approach using the 112 summary statistics appear less accurate than when using 
the semi-automatic statistics, they still have broadly the right shape and location. Of 
course, because of the lower-dimensional regression density modelling, when using the 
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3-dimensional semi-automatic statistics, the resulting density estimates are closer to the 



MCMC-based estimate of Bortot et al. (2007) albeit obtained with a far smaller computa- 



tional overhead. In summary, these results appear to broadly support the previous study 
of the ellipsoidal model by Bortot et al. (2007)} althoug h there is some indication that 



the posterior marginals obtained by jBortot et al. (2007) may be too wide, especially for 



A. This would be consistent with the regression density estimate's construction at 5 = 



compared with that of 5 = 3.3 for Bortot et al. (2007) 



As previously discussed, once the estimate of L(S\6) is obtained, it is then a simple 
matter to perform a range of likelihood-based inferences that are not immediately avail- 
able with standard ABC methods. Table [T] provides numerical estimates of the mean 
and the 0.025 and 0.975 quantiles of the marginal posterior for each parameter under 
various models and analyses, based on the semi-automatic summary statistics. The 
estimates (\ m i e , cr m ie, £mie) denote the maximum likelihood estimates based on L(S\0), 
obtained by maximising the analytical expression for the approximate likelihood using 
the optim function in R. These frequentist estimates are consistent with their Bayesian 
counterparts in Table [T] due to the uninformative prior implementation. The estimates 
(AabC; &abc, £,abc) indicate those obtained using the MCMC-based ABC method of 



Bortot et al. (2007) with kernel scale parameter 8 = 3.3. These analyses use the 112- 
dimensional summary statistics. For the spherical model, this ABC posterior is known to 
coincide with the true posterior, and this is in evidence in the results, although the upper 
tail for Xabc is estima ted slightly shorter. As with Figure |5j the marginal posteriors for 

the"" 



Bortot et al. (2007) analysis are wider than those obtained from L(S\0). 



Finally, the estimates (A a ,cr a ,£a) correspond to those obtained under the regression 
density approach for different choices of the prior for £ ~ N(0, a 2 ), for a = 100, 1, 0.5 



and 0.1. The original analysis of Bortot et al. (2007) with a = 100 did not incorporate 



any information about the upper-tail heaviness of inclusion sizes: £ > 2 is physically un- 
likely, as this would result in extremely heavy tails; £ < — 1 is also unlikely as this would 
result in a very clear maximum inclusion size truncation. It is computationally trivial to 
implement this prior sensitivity analysis once the estimate L(S\0) has been obtained, in 
contrast with standard ABC approaches. The sensitivity analysis in Table[l]indicates that 
the vague prior (a = 100) does not affect the conclusions, with differences in the posterior 
only emerging for very strong prior beliefs (a = 0.1). 

For both spherical and ellipsoidal inclusion models we additionally constructed a 
likelihood estimate by simply fitting a mixture of multivariate normals to the samples 
(SW,#W) and then conditioning on 0. This is equivalent to our regression density ap- 



proach without the transformation (S, 0) — > (U, 0), or the method of Bonassi et al. (2011) 
but conditioning on to obtain a likelihood, rather than on S to produce a posterior. For 
both models this approach performed reasonably well when the 3 semi-automatic statis- 
tics were used (results not shown). However, it became numerically unstable in the 115 
dimensional setting. That the regression density approach is numerically stable even in 
high dimensions indicates the importance of the flexible mixture of experts transforma- 
tion, (S, 0) -)■ (U, 0), in this setting. 
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5 Discussion 

Approximate Bayesian computation using an estimated likelihood function has many 
advantages over direct estimation of the posterior. As such, we argue that methods that 
focus on likelihood construction should be considered an important part of the ABC tool- 
box. While conditional density estimation in high dimensions is challenging, it is an ac- 
tive area of current research. As the reliability and flexibility of these techniques improve, 
this will only enhance the value of the likelihood-focused approach to ABC. 
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Figure 1: Pairwise scatterplots between the three semi-automatic summary statistics 
5 1 , S 12 , 5 3 (Fearnhead and Prangle, 2012) and the three parameters A, <r, £ for the ellip- 
soidal inclusions model. 
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;ure 2: Pairwise scatterplots between the mixture of experts transformed statistics, 
,U 2 ,U 3 and the three parameters A, a, £ for the ellipsoidal inclusions model. 
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Table 1: Posterior summaries for = (A, a, £) for the spherical (left) and ellipsoidal (right) 
models. Columns correspond to the estimated 0.025 and 0.975 quantiles and the posterior 
mean for each parameter, under each model. The mle parameter suffix indicates results 
from maximum likelihood estimation (giving the MLE and lower and upper 95% con- 
fidence limits), ABC indicates estimates from the ABC posterior using the approach of 
Bortot et al. (2007) with 5 = 3.3. A numerical suffix (e.g. A a , a a , £ a ) indicates the prior for 
£ is iV(0, a 2 ) under the regression density approach. 
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Figure 3: Mixture of experts model fits of fj(S^\9) for j = 5, 40, 80 and 110. Solid line 
indicates fitted density at the point 0q = (Ao, cro 5 Co) = (30, 1.5, —0.05). Histogram denotes 
the true density obtained by drawing samples from L(S J : \0q) at the point 0q. 
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Figure 4: Estimated marginal posteriors for the spherical inclusions model. Solid lines 
correspond to true marginal density, obtained using standard MCMC. Histograms in- 
dicate marginal posteriors based on regression density estimation using: (top row) 112 
summary statistics and (middle row) the 3 semi-automatic summary statistics (Fernhead 
and Prangle, 2012). Bottom row shows quantile-quantile plots of the 3 summary statistic 
approximation to the true marginal distributions. 
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Figure 5: Estimated marginal posteriors for the ellipsoidal inclusions model. Solid lines 
correspond to the marginal density obtained by the ABC-MCMC approach of Bortot et 
al. (2007), with kernel scale parameter 5 = 0.33. Histograms indicate marginal posteri- 
ors based on regression density estimation using: (top row) 112 summary statistics and 
(bottom row) the 3 semi-automatic summary statistics. 
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